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ABSTRACT Soil microbial diversity represents the largest global reservoir of novel microorganisms and enzymes. In this study, we 
coupled functional metagenomics and DNA stable-isotope probing (DNA-SIP) using multiple plant-derived carbon substrates 
and diverse soils to characterize active soil bacterial communities and their glycoside hydrolase genes, which have value for in- 
dustrial applications. We incubated samples from three disparate Canadian soils (tundra, temperate rainforest, and agricultural) 
with five native carbon ( 12 C) or stable-isotope-labeled ( 13 C) carbohydrates (glucose, cellobiose, xylose, arabinose, and cellulose). 
Indicator species analysis revealed high specificity and fidelity for many uncultured and unclassified bacterial taxa in the heavy 
DNA for all soils and substrates. Among characterized taxa, Actinomycetales (Salinibacterium), Rhizobiales (Devosia), Rhodo- 
spirillales (Telmatospirillum), and Caulobacterales (Phenylobacterium and Asticcacaulis) were bacterial indicator species for the 
heavy substrates and soils tested. Both Actinomycetales and Caulobacterales (Phenylobacterium) were associated with metabo- 
lism of cellulose, and Alphaproteobacteria were associated with the metabolism of arabinose; members of the order Rhizobiales 
were strongly associated with the metabolism of xylose. Annotated metagenomic data suggested diverse glycoside hydrolase gene 
representation within the pooled heavy DNA. By screening 2,876 cloned fragments derived from the 13 C-labeled DNA isolated 
from soils incubated with cellulose, we demonstrate the power of combining DNA-SIP, multiple-displacement amplification 
(MDA), and functional metagenomics by efficiently isolating multiple clones with activity on carboxymethyl cellulose and fluo- 
rogenic proxy substrates for carbohydrate-active enzymes. 

IMPORTANCE The ability to identify genes based on function, instead of sequence homology, allows the discovery of genes that 
would not be identified through sequence alone. This is arguably the most powerful application of metagenomics for the recov- 
ery of novel genes and a natural partner of the stable-isotope-probing approach for targeting active-yet-uncultured microorgan- 
isms. We expanded on previous efforts to combine stable-isotope probing and metagenomics, enriching microorganisms from 
multiple soils that were active in degrading plant-derived carbohydrates, followed by construction of a cellulose-based meta- 
genomic library and recovery of glycoside hydrolases through functional metagenomics. The major advance of our study was the 
discovery of active-yet-uncultivated soil microorganisms and enrichment of their glycoside hydrolases. We recovered positive 
cosmid clones in a higher frequency than would be expected with direct metagenomic analysis of soil DNA. This study has gener- 
ated an invaluable metagenomic resource that future research will exploit for genetic and enzymatic potential. 
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Soil microorganisms catalyze Earth's biogeochemical reactions, 
including the degradation of organic matter and recycling of 
nutrients. Soils host diverse microhabitats with varied physico- 
chemical gradients and environmental conditions. In this context, 
soil microorganisms live in consortia, interacting physically and 
biochemically with other members of the soil biota (1). Attesting 
to the heterogeneity, interactivity, and connectivity of the soil 
niche, traditional culture-based techniques grossly underestimate 
microbial diversity. Readily cultured microorganisms typically 
represent a very small proportion of soil microbial communities 
(2); the "uncultured majority" harbor an enormous reservoir of 
uncharacterized organisms, genes, and enzymatic processes (3). 
An outstanding methodological question remains: how best to 



access the biotechnological potential contained within the DNA of 
soil's uncultured microorganisms? 

Degradation of plant organic matter by the combined action of 
glycoside hydrolase (GH) enzymes is an important soil function. 
The GH group of enzymes is distributed across a wide variety of 
organisms. They catalyze the hydrolysis of glycosidic bonds in 
complex carbohydrates (e.g., cellulose and hemicellulose) to re- 
lease simple sugars (e.g., pentoses and hexoses), and as a result, 
GHs include important enzymes for biotechnological applica- 
tions. Because glycosidic bonds are considered among the most 
stable linkages that occur naturally, GHs are credited as some of 
the most proficient catalysts (4). Recent research suggests a broad 
diversity of bacteria contribute to plant polymer degradation (5- 
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TABLE 1 Location and physicochemical characteristics of the soil samples selected for DNA stable-isotope probing incubations" 



Sample Location 


_ ,, Amt of carbon 
Bulk 

... , , (% dry wt) 
Latitude and density 

longitude (g/cm3) Total Inorganic 


Organic 


Moisture Amt of nitro; 
pH (% dry wt) (% dry wt) 


;en 

Soil type 


Arctic tundra ( 1 AT) Daring Lake, North-West 


64°52'N, 0.2 


46.9 BDL'' 


46.9 


3.9 417.7 1.42 


Organic 


Territories, Canada 


111°35'W 










Temperate rainforest Pacific coastal rainforest, 


48°36'N, 0.6 


10.8 BDL 


10.8 


4.9 69.8 0.35 


Coarse sandy loam 


(7TR) Vancouver Island, Canada 


124°13'W 










Agricultural soil-wheat Elora Research Station, 


43°38'N, 1.1 


1.85 0.12 


1.7 


7.4 17.9 0.19 


Silt loam 


(HAW) Ontario, Canada 


80°24'W 











,! For more details, see http://www.cm2bl.org/. 
b BDL, below detection limit. 



8), supporting the use of cultivation-independent methods, such 
as metagenomics, as most strategic for the recovery of genes and 
enzymes from these microorganisms. 

Metagenomics captures the genomes of environmental com- 
munity microbes, circumventing the need for cultivation and en- 
abling the exploration of microbial genetic diversity and biotech- 
nological potential (9). Metagenomic analyses have exposed new 
microbial pathways and reactions, yielding novel enzymes and 
products of economic importance. Given that metagenomic stud- 
ies demonstrate that the majority of total genetic diversity space 
remains unexplored, "it will be far more efficient and productive 
to seek new enzymes from metagenome libraries than to tweak the 
activities of existing ones" (10). Indeed, there are several recent 
examples of GHs (e.g., cellulases) recovered by functional screen- 
ing of metagenomic libraries from terrestrial environments (e.g., 
see references 11, 12, 13, and 14). These studies reflect a laborious 
limitation of bulk DNA metagenomic library construction: in the 
absence of suitable selections for phenotype, many clones (e.g., 
tens of thousands) must be screened prior to recovering targets of 
interest. In addition, recovered clones are theoretically the most 
abundant target genes in the microbial community of interest. 
Targeted metagenomic approaches, such as those involving an 
enrichment culture step (15), thus offer the potential to filter for 
sequences specific to an activity of environmental or industrial 
relevance. 

Stable-isotope probing (SIP) is a culture-independent method 
for targeting microorganisms that assimilate a particular growth 
substrate (16-18). For the analysis of genomic DNA of active or- 
ganisms, a SIP substrate (e.g., 13 C labeled or 15 N labeled) is incor- 
porated into the DNA (DNA-SIP) or RNA (RNA-SIP) of active 
organisms, and isopycnic ultracentrifugation can differentiate la- 
beled nucleic acids from an abundant background of unlabeled 
community genomes. Combining SIP with metagenomics pro- 
vides access to the genomes of less-abundant community mem- 
bers and offers insight into complex environmental processes, 
such as biodegradation (as reviewed in references 19, 20, and 21). 

Several studies have combined DNA-SIP and metagenomic se- 
quencing to identify high proportions of genes from active micro- 
organisms, such as those using glycerol (22), Q compounds (23- 
26), and biphenyl (27, 28). Previous SIP studies reported that in an 
agricultural soil (clay loam soil, pH 6.6), cellulose was metabolized 
by Bacteroidetes, Chloroflexi, and Planctomycetes; cellobiose and 
glucose were degraded predominantly by Actinobacteria (8). The 
results also suggested that cellulolytic bacteria are different from 
saccharolytic bacteria and that oxygen availability defined the dif- 
ferent taxonomic groups involved. Under anoxic conditions, cel- 



lulose was metabolized by Actinobacteria, Bacteroidetes, and Fir- 
micutes; carbon from cellobiose and glucose were assimilated by 
Firmicutes. Others found that members of the Burkholderiales, 
Caulobacteriales, Rhizobiales, Sphingobacteriales, Xanthomon- 
adales, and Group 1 Acidobacteria were associated with three dif- 
ferent soils amended with cellulose (29). A recent survey of active 
bacteria in an Arctic tundra sample found Clostridium and Sporo- 
lactobacillus involved in 13 C-glucose assimilation and Betaproteo- 
bacteria, Bacteroidetes, and Gammaproteobacteria involved in the 
assimilation of carbon derived from 13 C-cellulose (30). Others 
have used SIP and labeled cellulose to identify Dyella, Mesorhizo- 
bium sp., Sphingomonas sp., and an uncultured deltaproteobacte- 
rium (affiliated with Myxobacteria) linked to cellulose degrada- 
tion (6). 

The ability to identify genes based on function, instead of se- 
quence homology, is arguably the most powerful application of 
metagenomics for the recovery of novel genes (31) and a natural 
partner of the SIP approach for targeting active-yet-uncultured 
microorganisms (21). Previous studies were focused on the anal- 
ysis of single substrates or individual samples. In addition, only 
one previous study combined SIP and functional metagenomic 
screens, expressing labeled DNA within a surrogate Escherichia 
coli host for identification of enzyme activity (22). In this study, we 
expand on previous efforts to combine SIP and metagenomics (as 
reviewed in reference 21), enriching soil microorganisms active in 
degrading plant-derived carbohydrates and screening GHs 
through activity-based functional metagenomics. We combined 
SIP, high-throughput sequencing of labeled 16S rRNA genes 
and metagenomic DNA, multiple-displacement amplification 
(MDA), and functional metagenomics to identify active micro- 
organisms and associated GH enzymes. We also isolated GH- 
positive clones from a cosmid library in a much higher frequency 
than would be expected with traditional efforts using conven- 
tional metagenomics. 

RESULTS AND DISCUSSION 

Characterization of active soil bacteria. We used DNA-SIP as a 
targeted approach for enriching active soil microorganisms in- 
volved in the metabolism of five plant-derived carbohydrates 
(glucose, cellobiose, xylose, arabinose, and cellulose). Three dis- 
parate soil samples were obtained from the CM 2 BL soil collection 
based on maximal physicochemical diversity (Table 1) (http:// 
www.cm2bl.org/). In particular, soil pH was low for the Arctic 
tundra and temperate rainforest soil samples, suggesting that the 
microbial composition and diversity of these two samples would 
be fundamentally different from those in agricultural soil (32, 33). 
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The water-filled pore space (WFPS) was maintained between 50% 
and 60% to avoid decreased aerobic microbial activity at WFPS 
values of >60% (34, 35). 

Because 13 C-labeled cellulose was commercially unavailable at 
the time of this research, both native cellulose and 13 C-labeled 
cellulose were produced as the substrates for SIP incubations by 
Gluconacetobacter xylinus, generating predominantly amorphous 
cellulose (36), which is more readily degraded than crystalline 
cellulose (37). To ensure detectable labeling, similar to a previous 
experimental approach (8), glucose, cellobiose, arabinose, and xy- 
lose were added weekly (1.5 mmol of C) for 3 weeks, reaching 
levels approximately 5 to 500 times higher than those normally 
detected in soils (38, 39). Although substrate concentrations were 
higher than typical bulk soil concentrations, higher polysaccha- 
ride substrate concentrations would be expected in the root rhi- 
zosphere and in areas of active plant matter decomposition (as 
reviewed in reference 39), suggesting that our incubation condi- 
tions would not be unrealistic for some naturally occurring soils. 
These concentrations were chosen to ensure that labeled isotope 
was more abundant than endogenous soil carbon sources for the 
success of DNA-SIP, enabling the separation and purification of 
labeled DNA for subsequent molecular analyses (16, 40). Similar 
substrate concentrations and incubation times with glucose and 
cellulose were used previously (30), demonstrating minimal-yet- 
detectable labeling of DNA in an Arctic tundra soil sample. 

Metabolism of labeled substrates in DNA-SIP incubations was 
confirmed by higher headspace C0 2 production in all substrate- 
amended serum vials compared to uninoculated controls for each 
of the three soils (Fig. 1). In all cases, cellulose-amended vials 
demonstrated reduced C0 2 production compared to the other 
substrates, further justifying an extended incubation time for this 
comparably recalcitrant substrate. The average amount of C0 2 
released after 6 days was 13% of the headspace, which, after sub- 
traction of the average C0 2 produced in uninoculated vials, was 
approximately equivalent to 1.4 mmol of carbon. This represents 
93% of the total weekly carbon added (-1.5 mmol of carbon). 

In addition to monitoring C0 2 production in all vials, separate 
soil incubations were prepared with a defined helium-oxygen 
headspace and glucose amendment in order to monitor 0 2 con- 
sumption. As expected, the addition of glucose stimulated 0 2 con- 
sumption, but the headspace remained oxic for each of the weekly 
incubation periods over the first 3 weeks (see Fig. SI in the sup- 
plemental material), indicating that weekly aeration of experi- 
mental vials was sufficient to deplete C0 2 and replenish 0 2 . Main- 
taining oxic conditions was important to ensure that the DNA-SIP 
incubation recovered DNA from microorganisms involved in aer- 
obic degradation of complex carbohydrates in addition to captur- 
ing DNA from microorganisms involved in anaerobic metabolism 
(41). Indeed, recent oxic incubations demonstrated activity of an- 
aerobic Clostridia (8, 30, 42), presumably because anoxic microen- 
vironments exist even within oxic experimental microcosms. 

Confirmation of isotope labeling. At the two time points of all 
incubations ( 1 and 3 weeks for all substrates, except for cellulose, 
which was sampled at 3 and 6 weeks), DNA was retrieved for the 
analysis of bacterial community composition by agarose gel elec- 
trophoresis and denaturing gradient gel electrophoresis (DGGE) 
(43). All DNA extracts from microcosm soils were subjected to 
density gradient ultracentrifugation and recovered in 1 2 fractions, 
which were analyzed in agarose gels. The results demonstrated 
that all soils possessed more DNA in 13 C-incubated heavy frac- 




0 5 10 15 20 25 30 



Time (days) 

FIG 1 Carbon dioxide production for Arctic tundra (1AT) (A), temperate 
rainforest (7TR) (B), and agricultural (HAW) (C) soils. Soil samples were 
amended with labeled ( 13 C) or unlabeled ( 12 C) substrates, and serum bottles 
were aerated weekly to replenish oxygen and deplete carbon dioxide. The 
"control" represents a soil sample incubated without substrate. 



tions (i.e., 1 to 7) than in 12 C-control fractions (i.e., 8 to 12) from 
glucose, cellobiose, arabinose, and xylose SIP incubations (see 
Fig. S2 to S6 in the supplemental material). For cellulose, only 
temperate rainforest and agricultural soil incubations resulted in 
heavier DNA visible in agarose gels corresponding to 13 C-labeled 
sample heavy DNA fractions (see Fig. S6) for the 6-week time 
point. Similar results were observed for all earlier time points but 
with less DNA associated with heavy fractions for 13 C-incubated 
samples compared to the later time points (data not shown). Al- 
though extended incubation times were important, one caveat of 
extended incubation times for SIP incubations (e.g., for cellulose) 
is that labeled carbon might have been distributed more broadly 
within the microbial community, which may result in less-specific 
enrichment of substrate-degrading microbial genomes in the re- 
sulting data and libraries. 

The presence of distinct fingerprint profiles in heavy fractions 
for 13 C-incubated samples, but not for the corresponding de- 
control fractions, demonstrates isotopic enrichment of nucleic 
acids (16). Bacterial DGGE fingerprints corresponding to all late- 
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time-point fractions demonstrated unique patterns associated 
with the heavy fractions (e.g., fractions 1 to 7) for all 13 C- 
incubated SIP microcosms (see Fig. S2 to S6 in the supplemental 
material). Although some cross-gradient fingerprint variations 
were associated with 12 C-control DNA, these differences were 
likely GC content shifts because they were pronounced only in the 
lightest fractions (e.g., fractions 10 to 12) and were distinct from 
shifts associated with fractionated 13 C-DNA. Substrate- and soil- 
specific heavy fraction patterns were consistent for early- and late- 
time-point samples (data not shown), which indicated that de- 
tected active bacteria were stable over time rather than changing 
due to food web dynamics (40). 

Heavy DNA fingerprints were used to identify fractions con- 
taining 13 C-labeled DNA for subsequent 16S rRNA gene sequenc- 
ing, bulk DNA sequencing, and functional metagenomics. Based 
on DGGE patterns, we identified fraction 5 and/or 6 as being 
representative of heavy DNA and fraction 10 as representing light 
DNA for all soils, substrates, and incubation times (see Fig. S2 to 
S6 in the supplemental material). Although fractions 1 to 5 also 
may have captured DNA from labeled microorganisms, these 
fractions were not analyzed further because the vanishingly small 
proportions of DNA recovered from these gradient fractions 
would have made PCR and subsequent metagenomic library 
preparation problematic. 

Taxonomic characterization of heavy DNA. We selected rep- 
resentative gradient fractions from all soils, substrates, and incu- 
bation times for profiling of the bacterial V3 region of 16S rRNA 
genes. Based on DGGE data, we selected fractions 6 (heavy) and 10 
(light) for Arctic tundra and fractions 5 (heavy) and 10 (light) for 
temperate rainforest and the agricultural soil. In addition, we se- 
quenced V3 regions of 16S rRNA genes from DNA extracted from 
the initial soil samples used to establish SIP incubations to deter- 
mine whether light fractions resembled the original soil commu- 
nity as would be expected. Following paired-end-read assembly, 
we analyzed 630,000 assembled sequences (10,000 sequences per 
sample) using an AXIOME management of the QIIME pipeline 
and additional custom analyses (e.g., multiresponse permutation 
procedure [MRPP] and indicator species analysis). Good's cover- 
age (44) for the heavy fraction samples ranged from 84 to 92%, 
and light fraction samples ranged from 68 to 85%, which indicates 
that this level of sequencing captured the majority of bacterial taxa 
in these samples. jS diversity was assessed by weighted UniFrac 
distances visualized within principal coordinate analysis (PCoA) 
plots. The results indicated that all samples from within each of 
the three soil treatments were grouped distinctly according to soil 
type (Fig. 2A), which was highly significant based on MRPP anal- 
ysis (A = 0.18 [chance-corrected within-group agreement], T = 
-20 .4 [test statistic], P < 0.001). Both the Arctic tundra and tem- 
perate rainforest soil profiles grouped more closely with one an- 
other, which is likely a result of both soils sharing low pH (Ta- 
ble 1), a major determinant of soil bacterial diversity and 
taxonomic composition (45, 46). In addition, all heavy and light 
fraction profiles for the three soils were clustered distinctly 
(Fig. 2A), which was also highly significant (A = 0.40, T = —28.3, 
P < 0.001). Native soil phylogenetic profiles clustered with their 
respective light fractions, indicating that the "background" bacte- 
rial community remained relatively constant throughout the SIP 
incubation. Although the two time points for some 13 C-labeled 
substrates grouped together (Fig. 2B), the differences between 
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FIG 2 Principal coordinate analysis (PCoA) biplots of weighted UniFrac 
distances for 16S rRNA gene sequences generated by assembled paired-end 
Illumina reads. Samples separated by soil type and fraction (A) as well as by 
carbon source (B). Native soils were associated with their respective light frac- 
tions. Gray spheres represent taxonomic affiliations of OTUs that correlated 
most strongly within the ordination space. 



heavy and light fractions were much greater than those observed 
between the five substrates used in this study. 

Many operational taxonomic units ( OTUs ) were affiliated with 
SIP-derived heavy DNA, but multiple permutations of the analy- 
sis were required to summarize indicator OTUs for different sam- 
ple subsets. We used indicator species analysis (47), with an indi- 
cator value (IV) threshold of 70% and a >250 minimum sequence 
sum threshold to identify the strongest significant OTUs (P < 
0.01) associated with (i) all heavy DNA samples (versus all light 
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FIG 3 Cleveland plot of operational taxonomic unit (OTU) abundance for OTUs possessing the highest indicator values (i.e., >70%) for an association with 
DNA-SIP heavy DNA (black squares [average abundance]) for all substrates and soils combined, in comparison to light DNA (gray squares [average abun- 
dance]). Taxonomic affiliations are included for phyla, with additional classifications for order (o_), family (f_), and genus (g_). For additional details, see 
Table SI in the supplemental material. 



DNA samples) (Fig. 3; see Table SI in the supplemental material), 
(ii) all heavy DNA samples within each soil type (versus all light 
DNA samples for the same soil type) (see Table S2 in the supple- 
mental material), (iii) each substrate across all heavy DNA sam- 
ples from all soil types (versus the heavy DNA for the other sub- 
strates from all soil types) (see Table S3 in the supplemental 
material), and (iii) each substrate from heavy DNA within each 
soil type (versus the other substrates for the same soil type heavy 
DNA) (see Tables S4 to S6 in the supplemental material). 

When we compared OTUs associated with all heavy DNA sam- 
ples versus all light DNA samples from all soils, indicator species 
analysis revealed multiple poorly classified indicators, in addition 
to genus-classified OTUs associated with the Salinibacterium (Ac- 
tinobacteria), Devosia (Alphaproteobacteria), Telmatospirillum 
(Alphaproteobacteria), Phenylobacterium (Alphaproteobacteria), 
and Asticcacaulis (Alphaproteobacteria) genera (Fig. 3; see Table SI 
in the supplemental material). The indicator species analysis from 
all heavy DNA samples versus all light DNA samples within each 
soil type showed that the predominant genus-classified OTUs 
identified in heavy fractions from tundra soil (1AT) were Salini- 



bacterium (Actinobacteria), Rhodanobacter (Gammaproteobacte- 
ria), Conexibacter (Actinobacteria), Telmatospirillum (Alphapro- 
teobacteria), Asticcacaulis (Alphaproteobacteria), and Burkholderia 
(Betaproteobacteria), in addition to OTUs within orders such as 
Sphingomonadales and Acidobacteriales (see Table S2 in the sup- 
plemental material). The temperate rainforest soil (7TR) heavy 
DNA was dominated by OTUs classified to the genera Paucibacter 
(Betaproteobacteria), Burkholderia (Betaproteobacteria), Spiro- 
chaeta (Spirochaetes), Salinibacterium (Actinobacteria), Telmato- 
spirillum (Alphaproteobacteria), Labrys (Alphaproteobacteria), 
Mesorhizobium (Alphaproteobacteria), and Phenylobacterium (Al- 
phaproteobacteria), in addition to uncharacterized genera from 
other phyla, such as Verrucomicrobia (see Table S2). The agricul- 
tural soil wheat ( 1 1 AW) heavy DNA OTUs were represented by 
the genera Pseudomonas (Gammaproteobacteria) , Devosia (Alpha- 
proteobacteria), Pseudoxanthomonas (Gammaproteobacteria), 
Salinibacterium (Actinobacteria), Ramlibacter (Betaproteobacte- 
ria), Ochrobactrum (Alphaproteobacteria), Paenibacillus (Firmic- 
utes), and Aeromicrobium (Actinobacteria) and further unclassi- 
fied members of the orders Pseudomonadales, Rhizobiales, 
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FIG 4 Glycoside hydrolase (GH) families associated with pooled heavy DNA. 
Functional annotation of the metagenomic data revealed diverse GH gene 
representation within the pooled heavy DNA. 



Xanthomonadales, Actinomycetales, Burkholderiales, and Bacillales 
(see Table S2), among others. 

Orders associated with the metabolism of cellulose were dom- 
inated by Actinomycetales and Caulobacterales (genus Phenylobac- 
terium) (see Table S3 in the supplemental material). Members of 
the Alphaproteobacteria were associated with the metabolism of 
arabinose, and members of the order Rhizobiales were strongly 
associated with the metabolism of xylose. There were no specific 
indicator species associated with glucose or cellobiose across all 
soils (see Table S3), which might also suggest that abundant soil 
OTUs were also active in assimilating these substrates. 

The predominant indicator species for the agricultural soil fed 
with [ 13 C] glucose were associated with Paenibacillus (Bacillales) 
(see Table S4 in the supplemental material). The use of cellulose 
was associated with Mesorhizobium (Alphaproteobacteria), Devo- 
sia (Alphaproteobacteria), and Cellvibrio (Gammaproteobacteria), 
in addition to other poorly classified OTUs from the Sphingomon- 



adales and Actinomycetales. The use of cellulose in temperate rain- 
forest soil was associated with the Myxococcales (Deltaproteobac- 
teria) (see Table S5 in the supplemental material). An OTU 
affiliated with Caulobacterales was associated with the metabolism 
of glucose in Arctic tundra. Nevskia (Gammaproteobacteria), and 
two OTUs affiliated with the Acidobacteria were associated with 
tundra cellulose assimilation (see Table S6 in the supplemental 
material). No other OTUs were significant indicators for the re- 
maining substrates (i.e., cellobiose, arabinose, and xylose) for the 
three soils, which might indicate that active taxa were also abun- 
dant soil bacteria. 

Although our DNA-SIP incubation revealed many poorly clas- 
sified indicator taxa (see Tables SI to S6 in the supplemental ma- 
terial), many of the indicator species associated with heavy DNA 
were expected based on previous studies. For example, Salinibac- 
terium was associated with frozen soils from glaciers (48) and 
Antarctic permafrost (49) . This genus has been associated with the 
metabolism of a variety of carbon sources, including sucrose, glu- 
cose, cellobiose, mannose, melibiose, maltose, galactose, arabi- 
nose, and fructose (48, 50). In addition, Devosia species were iso- 
lated from greenhouse soil and beach sediments, testing positive 
for the hydrolysis of esculin, /3-galactosidase, j3-glucosidase, and 
N- acetyl- /3-glucosaminidase, although unable to degrade car- 
boxymethyl cellulose (CMC) (51, 52). Phenylobacterium and 
Burkholderia are abundant in forest soils (53) and the genus Astic- 
caulis was identified among aerobic chemoorganoheterotrophs in 
tundra wetlands, able to use glucose, sucrose, xylose, maltose, ga- 
lactose, arabinose, lactose, fructose, rhamnose, and trehalose, 
among other carbon sources (54). The genus Spirochaeta has some 
species that are free-living saccharolytic and obligate or facultative 
anaerobes and were isolated from diverse environments, mainly 
from extreme aquatic environments (55, 56). Spirochaeta ameri- 
cana was reported to be a consumer of D-glucose, fructose, mal- 
tose, sucrose, starch, and D-mannitol (56), and Spirochaeta ther- 
mophila was reported to be a cellulolytic organism; the study of its 
genome revealed a high proportion of genes encoding more than 
30 GHs(55). 

MG-RAST analysis and functional annotation. We used next- 
generation sequence analysis of bulk 13 C-labeled DNA to survey 
the prevalence of annotated GHs within three pooled samples that 
were targeted for subsequent functional metagenomic screens. 
Guided by the UniFrac-based PCoA plot (Fig. 2), we pooled heavy 



TABLE 2 Substrate-specific activities of positive metagenomic clones from the [ 13 C] cellulose DNA-SIP library 



Activity (/iM MU released) 1 





Insert size 


ct-L-Arabinofuranoside 


(3-D- 


/3-D- 


(3-D- 


N-Acetyl-J3-D- 


CMC 


Clone 


(kb) 


pyranoside 


Cellobiopyranoside 


Glucopyranoside 


Xylopyranoside 


galactosaminide 


activity 


C122 


21.6 


0.4 


0.2 


0.6 


0.7 


124.2 




C424 


8.2 


0.9 


57.6 


109.4 


1.6 


0.7 




C762 


13.5 


2.4 


5.4 


21.2 


0.7 


0.4 




C1024 


16.8 


123.8 


6.5 


35.8 


1.7 


0.5 




C1088 


11.9 


0.5 


25.6 


79.2 


1.2 


0.6 




C2194 


12.9 


0.5 


0.3 


0.6 


0.4 


39.6 




C2380 


14.9 


0.38 


0.46 


0.53 


0.41 


0.40 


+ + + 


C2044 


14.7 


0.40 


0.40 


0.52 


0.39 


0.36 


+ + 


11 Ccllulase activity was scored by Congo red staining of clones 


on the LB-CMC plate. Other activities were measured in 


cell-free extracts usin| 


methylumbelliferone-based 





substrates. MU, methylumbelliferone units based on equal volumes of sample for each assay. 

b CMC, carboxymethyl cellulose. Plate-based clearing {high, + + + ; medium, + + ; negative, — ) was detected by Congo red stain and activity based on comparison to those of 
positive and negative controls. 
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TABLE 3 Analysis of cosmid insert end sequences 



BLASTx result for": 





Forward read 




Reverse read 








E value (% identity 




E value (% identity 


Clone 


Description 


[no. positive/total]) 


Description 


[no. positive/ total]) 


C122 


PovpliyroTiionas gingivulis 

(4-amino-4-deoxy-L-arabinose transferase) 


4e-5 (29 [40/139]) 


Cellvibvio j&ponicus Uedal07 
(/3-xylosidase) 


8e-136 (82 [131/162]) 


C424 


Cellvibvio sp. strain BR 

(DNA-directed DNA polymerase) 


le-28 (69 [66/80]) 


Cellvibvio sp. strain BR 
(Glucuronate isomerase) 


2e-103 (91 [157/163]) 


C762 


Chthoniobacter flavus 

(putative PAS/PAC sensor protein) 


le-86 (78 [151/171]) 


Sovangium cellulosum 
(hypothetical protein) 


2e-28 (54 [83/125]) 


C1024 


Cellvibrio sp. strain BR 
(glucuronate isomerase) 


2e-17 (95 [34/40]) 


Cellvibvio sp. strain BR 
(gluconolaconase) 


2e-46 (80 [85/96]) 


C1088 


Saccharophagus degradans 

(SSS sodium solute transporter superfamily) 


6e-61 (68 [123/150]) 


Cellvibvio sp. strain BR 
(auxin efflux carrier) 


5e-44 (75 [101/114]) 


C2194 


Dyadobacter fevmentans 
(ROK family protein) 


le-91 (95% [140/142]) 


Failed sequencing 
reaction 




C2380 


Alicyclobacillus acidocaldarius 

(Glyoxalase/bleomycin resistance 
protein/ dioxygenase) 


2e-15 (52 [51/69]) 


Cellvibvio sp. strain BR 

(glucosamine fructose-6-phosphate 
aminotransferase, isomerizing) 


3e-105 (96 [162/163]) 


C2044 


Cellvibrio sp. strain BR 

(DNA polymerase III subunit delta) 


le-71 (96 [116/118]) 


Dyadobactev fevmentans 
(hypothetical protein) 


9e-129 (97 [181/184]) 



,! Cosmids were end sequenced with Ml 3 forward and reverse primers flanking the site of metagenomic DNA insertion. For each clone, two end sequences were obtained and are 
referred to as "reverse" and "forward" reads. Top matches for BLASTx analyses are shown. Positive results are the number of amino acids from the query that match the amino 
acids from the subject sequence. The total number of amino acids from the subject is shown. 



DNA samples representing all substrates (except cellulose) associ- 
ated with low pH (i.e., temperate rainforest, Arctic tundra), heavy 
DNA for all substrates (except cellulose) from the agricultural soil, 
and the cellulose-enriched DNA from the three soils. Analysis of 
paired-end reads was performed by M G-RAST using annotations 
derived from the Swiss-Prot/Uniprot database. Only 19.4% 
(low-pH library), 19.6% (cellulose library), and 22.0% (agricul- 
tural library) of sequences were annotated by Swiss-Prot in MG- 
RAST using an E value threshold of 0.01, which is an important 
consideration for any subsequent analysis of annotation data 
based on this minority of sequences. Nonetheless, using a custom 
Perl script to convert Swiss-Prot annotations to CAZy GH identi- 
fiers, we detected 81 distinct GH families for the pooled-cellulose 
library and 80 GH families for each of the low-pH and agricultural 
soil composite libraries. The distribution of annotated GHs varied 
between samples, and the most abundant families in the three 
pooled samples were GH1, -2, -3, -5, -9, -13, -23, -28, and -35 (see 
Table S7 in the supplemental material). In addition, the three 
next-generation sequence data sets were very similar in their dis- 
tributions (i.e., r > 0.99) for the three libraries (Fig. 4), and all had 
representation among GH families commonly associated with 
known cellulases (GH1, -3, -5 to -9, -12, -45, -48, and -61), hemi- 
cellulases (GH8, -10 to -12, -26, -28, -53, and -74), and debranch- 
ing enzymes (GH51, -54, -62, -67, -78, and -74) as reviewed else- 
where (57, 58). The GH families involved in the hydrolysis of 
cellulose that were most abundant in our data were GH families 3, 
5, and 9 (Fig. 4; see Table S7). However, given that most GH family 
annotations were not represented by known CAZy identifiers and 
that only -20% of our paired-end reads were annotated by Swiss- 
Prot, the abundance and distribution of functional GH families in 
our pooled DNA is underrepresented. As a result, we used func- 
tional screens of large-insert metagenomic libraries for the recov- 
ery of GHs to help circumvent the limitations of sequence-based 
analysis of our heavy DNA samples. 



Enriched metagenomic library. Pooled high-molecular- 
weight DNA from the 13 C-cellulose-enriched SIP incubations for 
the three soils were captured in cosmid libraries and screened for 
GHs involved in the degradation of cellulose and other plant- 
derived polymers based on activity in E. coli. Multiple- 
displacement amplification (MDA) increased the amount of nu- 
cleic acids obtained from pooled cellulose DNA-SIP incubations 
prior to the isolation of 25- to 75-kb DNA fragments via pulsed- 
field gel electrophoresis (PFGE). The cellulose-SIP metagenomic 
library contained -83,000 clones with an average insert size of 
31 kb based on restriction digestion of a subset of 40 random 
clones (data not shown). These results compare favorably to a 
library of -10,500 clones generated from MDA-amplified SIP- 
enriched seawater DNA, which had an average insert size of 27 kb, 
ranging from 17 to 40 kb (26). 

We used a combined parallel approach for functional screen- 
ing of 2,876 randomly selected clones from the cellulose-enriched 
metagenomic library. Growth of colonies on LB supplemented 
with carboxymethyl cellulose (CMC), followed by poststaining 
with Congo red (59), facilitated identification of clones expressing 
either endoglucanase (EC 3.2. l.X) or glucosidase (EC 3.2. l.X) ac- 
tivities (60). From the 2,876 clones screened, we identified eight 
positive clones, two of which (C2380 and C2044) were capable of 
hydrolyzing CMC (Table 2). Restriction mapping showed that 
these two clones were distinct (Fig. 5). Clones C122 and C2194 
carried dissimilar DNAs encoding |3-N-acetyl-galactosaminidases 
(EC 3.2.1.53). /3-Glucosidase activities (EC 3.2.1.21) were de- 
tected in clones C424, C762, and C1088. Clones C424 and C1088 
contained overlapping DNA — probably from the same organ- 
ism — consistent with the substrate activity profiles. Restriction 
pattern of clone CI 024 was similar to CI 088 and C424 (Fig. 5), but 
C1024 had both a-L-arabinofuranosidase (EC 3.2.1.55) and 
j8-glucosidase (EC 3.2.1.21). The open reading frame (ORF) en- 
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Cosmid clones 



V# l& \0* % ^ 




Insert size (kb): 32.2 25.1 33.9 31.6 34.5 29.6 25.9 29.1 

FIG 5 Restriction of cosmid DNA with EcoRI-Hindlll-BamHI. DNA sizes in 
kb are marked on the left and right. M, molecular size markers. The sizes of 
digested DNA fragments except for the cosmid backbone (the very top band) 
were added up to obtain the insert sizes of the cloned metagenomic DNA. 



coding the |3-glucosidase was likely located in the overlapping 
region. 

End sequencing of the positive isolates demonstrated that most 
clones had at least one end sequence matching the known cellulo- 
lytic member of the Gammaproteobacteria, Cellvibrio sp. (6 1 ) , with 
69 to 95% identity (Table 3). Other top BLAST matches included 
Saccharophagus degradans, Dyadobacter fermentans, Alicyclobacil- 
lus acidocaldarius, and Chthoniobacter flavus (Table 3), with 29 to 
97% identity. Although these bacteria are not well characterized to 
date, other researchers have reported that they use cellulose and 
other carbohydrates as a carbon source and/or they contain GHs 
encoded in their genome (62-65). As predicted, the end sequence 
identities for C424 and C1088 were very similar taxonomically 
(i.e., Cellvibrio sp.). On the other hand, end sequence data for 
C122 and C2194 did not suggest a similar genomic origin (Ta- 
ble 3), consistent with the restriction pattern of these cosmids 
(Fig. 5). 

Posterior analysis of reverse and forward end sequences of the 
positive clones was done by comparing end sequences to Illumina 
forward and reverse reads from whole-genome sequencing of the 
three SIP libraries (see Table S8 in the supplemental material). The 
results showed that the majority of end sequences were repre- 
sented in the cellulose library, as expected, and only a few se- 
quence matches were found in other libraries using the selected 
threshold. 

The high frequency of positive clones after screening of DNA- 
SlP-derived clones compares favorably to those from previous soil 
functional metagenomic studies reporting the recovery of single 
positive cellulose hits from screening of tens of thousands of 
clones. For example, a single cellulose-encoding clone and two 
xylanase-encoding clones were recovered from functional screen- 
ing of 13,800 clones from three fosmid metagenomic libraries de- 
rived from grassland in Germany, with an insert size range of 



between 19 and30kb (11). Also, one cellulase-encoding clone was 
retrieved from the functional screening of 3,024 clones from a 
bacterial artificial chromosome metagenomic library derived 
from red soil in China, with insert sizes ranging from 25 to 165 kb 
(12). In another study, one cellulase-encoding clone was recov- 
ered from functional screening of 14,000 clones with an average 
insert size of 5 kb from a metagenomic phagemid library from a 
forest soil in China (13). Finally, a CMC-positive clone was re- 
trieved from a metagenomic fosmid library derived from wetland 
soil in South Korea, after screening of 70,000 clones with an aver- 
age insert of 40 kb (14). Although not conducted here, a well- 
replicated direct comparison of GH gene recovery from meta- 
genomic libraries prepared from SIP-derived heavy DNA, light 
DNA, and the original soil DNA would be necessary to confirm 
the effectiveness of DNA-SIP. In addition, the ability to recover 
GH genes in high proportions using cultivation-based enrichment 
approaches is a well-established alternative to direct meta- 
genomics (15). DNA-SIP incubations are designed to be less de- 
pendent on rapid growth of a readily cultivated subset of the mi- 
crobial community (40). Indeed, our labeled DNA contained 
many OTUs that were classified poorly within described bacterial 
taxonomies (see Tables SI to S6 in the supplemental material). 
Direct DNA-SIP and enrichment culture comparisons would be 
valuable but have not yet been conducted to our knowledge. 

In summary, the combination of DNA-SIP and metagenomics 
helped recover soil GHs in higher proportions than all previously 
reported efforts via direct metagenomics, which demonstrates the 
power of using DNA-SIP as an activity-based prefilter for targeted 
metagenomic approaches. Our study demonstrated the capability 
of scaling DNA-SIP analysis for the interrogation of multiple en- 
vironmental samples with multiple substrates, with sampling at 
multiple time points. A high-quality cosmid library with >31-kb 
inserts was constructed from heavy DNA originating from a 13 C- 
cellulose-incubated sample, and highly efficient screening of GHs 
from a small set of clones (0.3% positive hits) showed strong po- 
tential of the techniques combined in this study for functional 
metagenomics. Identification of the genes encoding GHs and 
characterization of these enzymes are ongoing and further func- 
tional screening of the 13 C-cellulose DNA-SIP library clones in 
other surrogate hosts will be assessed to identify additional GH 
representation. 

MATERIALS AND METHODS 

Soil samples. Three soil samples from the Canadian MetaMicroBiome 
Library (http://www.cm2bl.org/) were used: Arctic tundra 1 (1AT), tem- 
perate rainforest (7TR), and agricultural soil-wheat (HAW). Triplicate 
surface soils from the top 10 cm below the litter layer were combined to 
prepare a single composite for each site. Composite soil samples were 
sieved (2 mm), and subsamples were sent to the Agriculture and Food 
Laboratory (University of Guelph, Guelph, Ontario, Canada) for analysis 
of physicochemical properties (Table 1). 

SIP. D-Glucose was obtained from Bio Basic (Markham, Ontario, 
Canada). (U- 13 C 6 )-D-glucose (99%) was supplied by Cambridge Isotope 
Laboratories (Cambridge, Ontario, Canada). D-( + )-cellobiose, d-(-)- 
arabinose, and d-( + )-xylose were purchased from Sigma-Aldrich. d-(UL- 
13 C 5 )-arabinose, D-(UL- 13 C 5 )-xylose, and (UL- 13 C 12 )-cellobiose were 
obtained from Omicron Biochemicals (South Bend, IN). 

To minimize carbon available for competition with labeled substrates, 
composite soil samples were preincubated for 2 weeks in darkness at 15°C 
for 1AT and at 24°C for 7TR and 1 1AW. Ten grams of soil samples was 
added to 120-ml serum vials, which were sealed with butyl septa. Incuba- 
tions were conducted with stable-isotope ( 13 C) and native ( 12 C) sub- 



8 mBio mbio.asm.org 



July/August 2014 Volume 5 Issue 4 e01157-14 



Metagenomic Analysis of Soil Bacterial Communities 



strates, as well as no-substrate controls, for each of the three soils. Finely 
shredded cellulose was prepared from Gluconacetobacter xylinus grown 
with 13 C- or 12 C-glucose (30) as the sole carbon source. Purified bacterial 
cellulose (200 mg, 6.6 mmol C) was mixed into serum vials in a single 
dose. Labeled ( 13 C) and unlabeled ( 12 C) substrates were added to soil 
samples in multiple dosages over periods of 1 week and 3 weeks for glu- 
cose, cellobiose, xylose, and arabinose incubations or 3 weeks and 6 weeks 
for the cellulose incubations. Serum vials were aerated once per week for 
1 h in a fume hood. The weight of incubation vials was assessed weekly, 
and water-filled pore space (WFPS) was maintained between 50 and 60% 
by adding distilled water and/or substrate for each incubation according 
to the following formula (34): WFPS = w \p b pjp s — p b ], where w is the 
gravimetric water content (%), p b is the soil bulk density (g/cm 3 ), and p s is 
the soil particle density (2.65 g/cm 3 ). 

GC. C0 2 accumulation in the headspaces of serum vials was deter- 
mined using a GC-2014 gas chromatograph (Shimadzu) equipped with a 
thermal conductivity detector (TCD), methanizer, and a flame ionization 
detector (FID). The gas chromatography (GC) temperatures were main- 
tained for the oven (80°C), TCD (280°C), methanizer (380°C), and FID 
(250°C). No-carbon control incubations and separate serum vials 
amended with 12 C-glucose were used as surrogates for experimental vials 
because an N 2 -free headspace was required for measurement of 0 2 with 
the gas chromatograph. The headspaces of these separate vials were 
flushed with helium and supplemented with oxygen (20%) at the start of 
the experiment. Headspace C0 2 and 0 2 were measured every 3 days by 
direct injection of 0.5 ml of headspace gas through a packed Poropak Q 
column with a helium flow of 20 ml/min. 

DNA extraction and isopycnic centrifugation. Two grams of soil was 
sampled from each vial at the time points described above. DNA was 
extracted with a PowerSoil DNA Isolation kit (MO BIO Laboratories, 
Carlsbad, CA) according to the manufacturer's instructions. Extracted 
DNA was quantified using a NanoDrop 2000 UV-Vis spectrophotometer 
(Thermo Scientific; Montreal, Quebec, Canada) and a 1% agarose gel with 
a 1-kb DNA ladder (Invitrogen) for comparison. Cesium chloride (CsCl) 
gradients were processed by ultracentrifugation, and 12 fractions were 
collected for each sample as described previously (16, 66). 

DGGE. The V3 regions of bacterial 16S rRNA genes were PCR ampli- 
fied using primers 341f-GC and 518r (67). Each reaction mixture con- 
tained 19.75 pi of UV-treated water, 2.5 pi of 10X ThermoPol reaction 
buffer (New England BioLabs), 0.05 pi of deoxynucleoside triphosphates 
(dNTPs) (100 mM), 0.05 p\ of forward primer 341f-GC (100 pM), 0.05 p\ 
of reverse primer 518r (100 pM), 1.5 pi of bovine serum albumin (BSA) 
(10 mg/ml), 0.25 pi of Taq DNA polymerase (5 Vlpl) (New England 
BioLabs), and 1 pi of DNA template purified from each gradient fraction. 
The PCR conditions were initial denaturation at 95°C for 5 min, followed 
by 30 cycles of denaturation at 95°C for 1 min, annealing at 55°C for 1 min, 
and extension at 72°C for 1 min, followed by a final extension at 72°C for 
7 min. All PCR products were analyzed on 1% agarose gels prior to DGGE. 

Five microliters of each PCR product was loaded onto a 10% poly- 
acrylamide gel with a denaturing gradient of 30 to 70%. Gels were run at 
60° C for 14 h at 85 V (DGGEK-2001-110; C.B.S. Scientific, San Diego, 
CA) as described previously (43). A custom DGGE ladder was loaded into 
the two outside wells of the gel for subsequent normalization. Gels were 
stained for 45 min with SYBR green I nucleic acid gel stain (Thermo 
Fisher) and rinsed once in water prior to imaging. Gel images were taken 
with a Pharos Plus molecular imager system (Bio-Rad). 

Next-generation sequencing. High-throughput sequencing of the 
16S rRNA gene (V3 region) and paired- end- read assembly were con- 
ducted as described previously (68, 69). Based on DGGE data, we se- 
quenced gradient fractions 6 (heavy) and 10 (light) for 1AT and fractions 
5 (heavy) and 10 (light) for 7TR and HAW (60 samples in total). Three 
25-pl PCR amplifications per sample were conducted, each containing 
5 pi of the 5X Phusion HF buffer (Finnzyme, Finland), 0.125 pi of the 
V3F-modified primer (100 pM), 1.25 pi of an indexed reverse primer 
(10 pM) (V3-1R to V3-60R), 0.2 pi of dNTPs (100 mM), 0.25 pi of the 



Phusion high-fidelity DNA polymerase (2 VI pi) (Finnzyme), and 1 pi of 
DNA template (1 to 10 ng). The PCR conditions were as follows: initial 
denaturation at 98°C for 2 min, followed by 20 cycles of denaturation at 
98°C for 10 s, annealing at 50°C for 30 s, and extension at 72°C for 15 s. A 
final extension was performed at 72°C for 7 min. The triplicate 330-bp 
PCR products were pooled and analyzed on a 2% agarose gel. Individually 
indexed composites were combined in equal nanogram amounts and then 
resolved on a 2% agarose gel. The amplicon fragment was excised and 
purified using Wizard SV gel and PCR cleanup system (Promega, Madi- 
son, WI). Libraries were subjected to 108-bp end sequencing on the Ge- 
nome Analyzer IIx (Illumina, Inc., San Diego, CA) at the Plant Biotech- 
nology Institute (Saskatoon, Saskatchewan, Canada). 

Shotgun metagenomic sequencing was performed on DNA from three 
pooled fractions of the 13 C-labeled DNA from each treatment. Pooling of 
heavy DNA resulted in three composite samples for sequencing: (i) "low 
pH" (fractions 5, 6, and 7 of 1AT and fractions 4, 5, and 6 of 7TR) for week 
3 incubations with glucose, cellobiose, arabinose, and xylose; (ii) "agricul- 
tural" (fractions 4, 5, and 6 for HAW) for week 3 incubations with glu- 
cose, cellobiose, arabinose, and xylose; and (iii) "cellulose" (fractions 5, 6, 
and 7 for 1AT and fractions 4, 5, and 6 for 7TR and 1 1AW) for week 6 
incubations with cellulose. Shotgun sequencing samples of metagenomic 
DNA were prepared using the Nextera DNA sample preparation kit (Illu- 
mina). Pooled heavy DNA (25 to 50 ng) was fragmented using the tag- 
mentation reaction (-200 to 5,000 bp), according to the manufacturer's 
instructions and purified using the DNA Clean & Concentrator kit (Zymo 
Research Corporation, Irvine, CA) . Purified fragments were used as the 
template for a five-cycle PCR amplification; indexed sequencing adapters 
(Epicenter, Madison, WI) were used for the PCR. Each amplified sample 
was purified and subjected to size selection (400 to 800 bp) using a Pippin 
Prep device (Sage Science, Beverly, MA). Afterward, each library was 
quantified using the KAPA library quantification kit (KAPA Biosystems 
Woburn, MA) . Equimolar samples were pooled, concentrated, and quan- 
tified. Final concentrations were adjusted to 10 nM. Libraries were se- 
quenced using the HiSeq2000 sequencing system (Illumina) by the 
Institute for Genomic Biology Core Facility (University of Illinois). Se- 
quencing was performed using a TruSeq SBS kit (version 3), and data were 
analyzed using the Cassava 1.8 pipeline. Error rates were estimated at 
below 0.3%. Each sample yielded 42 to 90 million 100-bp end reads of 62 
to 63% average GC content. 

Statistical analysis. Taxonomic classification with RDP v2.2 (confi- 
dence 0.8 and GreenGenes Oct 2012 revision), principal coordinates anal- 
ysis (PCoA) with weighted UniFrac distances, multiresponse permutation 
procedures (MRPP), and indicator species (IS) analyses of 16S rRNA gene 
sequences generated by assembled paired-end reads were performed us- 
ing automated exploration of microbial diversity (AXIOME) automation 
of PANDAseq (69), the QIIME pipeline (70), and custom AXIOME anal- 
yses (71). 

MG-RAST analysis and CAZy annotation. Paired-end shotgun se- 
quences from the pooled heavy DNA samples were analyzed for GHs 
using the MG-RAST pipeline (72). Reads were annotated by comparison 
to sequences in the UniProt database (73), with no maximum E value 
cutoff, a 54% minimum percentage identity cutoff, and a 30-bp 
minimum-alignment-length cutoff. Using custom Perl scripts (see Algo- 
rithms SI and S2 in the supplemental material), Swissprot and Trembl 
database (UniProt release 2012 to 2014) hits were paired with matching 
GH family CAZy identifiers by comparing an extracted database of acces- 
sion numbers to CAZy identifiers (see Texts SI andS2 in the supplemental 
material). 

Cellulose-enriched metagenomic library construction. High- 
molecular- weight DNA was extracted from all three soil samples that were 
amended with 13 C-labeled bacterial cellulose (week 6 time point), using a 
gentle enzymatic lysis (74). Humic acids were removed from crude DNA 
as described previously (75), using the SCODA device (Aurora, Boreal 
Genomics; Vancouver, BC, Canada) with one wash cycle (70 V/cm, 10°C, 
90 min) and two concentration cycles (70 V/cm, 10°C, 60 min). DNA was 
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analyzed using a 1% agarose gel and quantified with the NanoDrop 2000 
spectrophotometer. Samples were subjected to cesium chloride density 
gradient ultracentrifugation and fraction collection as described previ- 
ously with minor modifications. Gradient fractions were diluted with 
1 volume of water and then, following addition of 2 volumes of ethanol, 
the DNA was precipitated overnight at — 20°C. DNA was collected by 
centrifugation for 30 min at 13,000 X g. The DNA was air dried, dissolved 
in 300 /id of water, and then precipitated by adding 1/10 vol of 3 M sodium 
acetate (pH 5.3) and 2 volumes of ethanol. After confirming that the 
fingerprints generated from an alternative lysis protocol were the same as 
those observed by DGGE, pooled samples and fractions for large-insert 
cosmid cloning were mixed in the same equal nanogram ratio used to 
prepare template for sequence-based metagenomics. 

To obtain a sufficient amount of DNA for 13 C-cellulose-enriched met- 
agenomic library construction, triplicate multiple displacement amplifi- 
cation (MDA) reactions were conducted using the illustra GenomiPhi V2 
DNA amplification kit (GE Healthcare, Mississauga, Ontario, Canada), 
according to the manufacturer's instructions. Each reaction mixture con- 
sisted of ~7 ng of DNA template in order to minimize potential amplifi- 
cation bias (26, 30, 76), yielding 3 to 4 fig of amplified DNA. Positive- 
control DNA from the kit and negative controls without DNA were run in 
parallel. MDA products were quantified on a 1% agarose gel and then 
pooled. 

To inactivate $29 DNA polymerase, MDA-amplified DNA (100 jul) 
was mixed with 613 fx\ of Tris-EDTA (TE), 73 /oil of 10X gel loading 
buffer, and 6.8 ii.1 of 20% SDS. After being heated at 65°C for 10 min, the 
sample was left on ice for 5 min and then centrifuged at 15,900 X g for 
5 min. The DNA-containing supernatant was loaded onto a 1% pulsed- 
field agarose gel (with Tris-acetate-EDTA [TAE] buffer) in order to size 
select DNA. Pulsed-field gel electrophoresis (PFGE) (CHEF Mapper; Bio- 
Rad) was run at 14°C, 5.5 V/cm, 120° angle, and an initial 1.0-s to final 
6.0-s switch time for 20 h. The outer lanes were loaded with a size marker, 
and following electrophoresis, these lanes were sliced off, poststained with 
SYBR green I nucleic acid gel stain, and visualized with a Clare Chemical 
Research Dark Reader. After reassembly of the gel, a gel slice correspond- 
ing to 25 to 75 kb of sample DNA was excised, electroeluted, and concen- 
trated as described previously (77). DNA end repair, ligation with cosmid 
pJC8, packaging, and transduction into E. coli HB101 were performed as 
reported previously (77). Resulting recombinant cosmid clones were 
pooled and saved in 7% dimethyl sulfoxide (DMSO) in 1-ml aliquots at 
— 75°C. Prior to pooling, 40 random E. coli clones from the plates were 
selected for analysis of cosmid DNA restriction patterns. The average sizes 
of cloned metagenomic DNA and coverage of bacterial genomes were 
calculated based on sizes of EcoRI-Hindlll-BamHI fragments and the 
number of recombinant library clones. Additionally, 2,876 random clones 
were inoculated into LB-Tc in 96-well plates and then grown overnight at 
37°C for functional screening. 

Functional screening. Clones were randomly selected and subjected 
to activity-based screening of GHs in E. coli HB101. These clones were 
grown in 96-well microtiter plates and were replicated onto 150-mm 
LB-Tc agar plates supplemented with carboxymethyl cellulose (CMC) 
(0.2%). The plates were incubated at 37°C for 1 week. Following removal 
of colonies from the plates by washing with water, 0.1% Congo red was 
used for poststaining. 

These clones were also tested for activity on a host of 
methylumbelliferyl-based fluorogenic proxy substrates. Clones were first 
grown in LB broth containing 15 p-g/ml tetracycline at 37°C in microtiter 
plates. Each well contained one glass bead, and plates were incubated with 
orbital shaking. After 24 h, 70 /id of preculture was transferred to a deep- 
well plate (96 wells) and cultured in Terrific Broth containing 15 /u.g/ml 
tetracycline for a further 24 h at 37° C with a glass bead and orbital shak- 
ing. Cells were collected by centrifugation and frozen. For lysis, cell pellets 
were thawed and chemically lysed using the BugBuster protein extraction 
reagent (Novagen). GH activities in cell-free extracts were measured 
using a-L-arabino-furanoside/pyranoside, |8-D-cellobiopyranoside, )3-d- 



glucopyranoside, j3-D-xylopyranoside, and N-acetyl-j3-D- 
galactosaminide. Reactions were carried out in 384-well microplates. Li- 
brary lysates were incubated with 0. 1 mM each substrate for 1 h at 50° C in 
a 40-u.l sodium citrate-buffered (50 mM, pH 5) reaction mixture. Reac- 
tions were stopped by the addition of 40 ju.1 of 0.2 M glycine (pH 10). 
Fluorescence was detected at 445 nm following excitation at 370 nm. 
Clones that demonstrated activity on one or more substrates were subcul- 
tured and rescreened on appropriate substrates to eliminate false-positive 
reactions. Protein concentrations were measured by the Bradford method 
with bovine serum albumin (BSA) used as a standard. 

End sequences of positive cosmid clones were obtained by Sanger 
sequencing using M13 forward and reverse primers at TCAG (Toronto, 
Ontario, Canada). We used BLASTx searches of translated nucleotide 
sequences against the NCBI protein database. End sequences were depos- 
ited in GenBank. Posterior BLAST analysis was done searching for se- 
quence similarities in the three libraries: low pH, agricultural, and cellu- 
lose (forward and reverse). Sequences with >95% similarity and >30 bp 
were recorded as positive matches. 

Nucleotide sequence accession numbers. Paired-end reads have been 
deposited in MG-RAST under identification no. 4482593.3 (low-pH for- 
ward), 4483544.3 (low-pH reverse), 4482599.3 (cellulose forward), 
4483820.3 (cellulose reverse), 4482600.3 (agricultural forward), and 
4483819.3 (agricultural reverse). End sequences of cosmid clones have 
been deposited in GenBank under accession no. KG771718 to KG771732. 
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